Machine Learning

Bayesian Methods for Ecological and Environmental Modelling

Peter Levy
UKCEH Edinburgh

Conventional statistical modelling: we declare

  • the model, f (the mathematical form of the variables x)
  • the algorithm estimates the parameters, \(\theta\)

Machine Learning: we declare

  • the variables x (“features”) to use
  • the algorithm estimates parameters \(\theta\) and the model f

Machine Learning

Machine Learning

https://xkcd.com

Neural Networks

  • inputs = covariates
  • outputs = response variables
  • layers = GLM model predictions

Neural Networks

  • Highly flexible, often uninterpretable
  • Still just a model, but typically have thousands of parameters
  • Not easy to do MCMC computation

Neural Networks

Ideally, we want to:

  • keep flexibility of ML
  • quantify uncertainty
  • incorporate prior knowledge

Some promising approaches

  • Bayesian Additive Regression Trees (BART)
  • Bayesian Adaptive Spline Surfaces (BASS)
  • Bayesian Gaussian Process Regression

Bayesian Additive Regression Trees (BART)

Related to random forests, gradient boosted methods, GAMs

https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.8347

Bayesian Additive Regression Trees (BART)

https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.8347

BART formal definition

\[y = \sum_{j=1}^m g(X, T_j, M_j) + \epsilon\]

  • \(m\) is the number of trees (fixed)
  • \(X\) is a matrix of covariates
  • \(T_j\) is a vector of binary split decision rules
  • \(M_j\) is a vector of terminal node weights
  • \(g()\) is a look-up function
  • \(\epsilon \sim N(0, \sigma^2)\) is an error term

Bayesian Additive Regression Trees (BART)

Very simple to implement:

library(BART)
bart_model <- wbart(x.train, y.train, x.test)

Try this in the next practical (time permitting) …

The perils of machine learning

  • avoids using prior knowledge of causation
  • depends critically on cross-validation
    • “out-of-sample” predictions are really a sub-sample of the data
    • pseudo-out-of-sample” predictions
  • depends critically on “Big Data”
    • i.e. representative sample of population to be predicted
  • requires uncertainty quantification

Observations are not enough

Heliocentric model

Geocentric model

Both models match observations but which is more likely?

Causal inference

Simpson’s paradox

Causal information needed to make sense of the data

Causal inference

Simpson’s paradox

Causal information needed to make sense of the data

Causal inference

Simpson’s paradox

Causal information needed to make sense of the data

Is machine learning science?

Science, from the Latin:

  • verb sciere to know

A systematic process for gaining knowledge about the real world.

How do we know anything?

How do we know anything?

How science works

Summary

  • Machine learning uses a model
  • flexible structure, many parameters
  • very powerful with Big Data
  • often uninterpretable
  • probalistic uncertainty is a work in progress
  • lack of causal information is a problem
    • cross-validation is limited by sample representativeness

Miscellaneous

Reporting Bayesian analyses

  1. The data
    • methods, key observational uncertainties
  2. The model
    • \(f\) function(s), \(x\) variables, \(\theta\) parameters
  3. Priors
    • All parameters or a subset? What joint probability distribution did you assign? What information did you use to quantify it: literature, personal judgment, expert elicitation?

Reporting Bayesian analyses

  1. Likelihood function
    • what likelihood function did you assign? How did you account for stochastic, systematic and representativeness errors?
  2. MCMC
    • Which algorithm did you use? How many chains and iterations? How did you assess convergence?

Reporting Bayesian analyses

  1. Posterior distribution for the parameters
    • What were posterior modes, means, variances and major correlations? How different was the posterior from the prior?
  2. Posterior predictions
    • when using the posterior probability distribution for the parameters, how well did the model(a) reproduce data used in the calibration, (b) predict data not used in the calibration?

What software to use?

What software to use?

Resources

Course & book: Statistical Rethinking, Richard McElreath

Causal inference: Causal Inference in Statistics: A Primer, Judea Pearl et al.